Strings

The most major difference between Python versions 2 and 3 is in string handling.

In Python 3 all strings are by default Unicode strings. The Python interpreter expects Python source files to be UTF-8 encoded Unicode strings.

What Unicode is beyond the scope of this course, but you can check

If you don't what Unicode or encodings are, do not despair. You will find out if you need to find out and people have lived their entire lives happily without knowing what encodings are.

Suffice to say that it is safe to use Unicode characters in strings and in variable names in Python 3.


In [ ]:
ananasakäämä = "höhö 电脑"
print(ananasakäämä)

Extra material

If you want to represent a character that for some reason you can't enter in your system or if you want to keep your source code ASCII-only (not necessarily a bad idea) you can enter non-ASCII characters


In [ ]:
print("\N{GREEK CAPITAL LETTER DELTA}") # using the character name
print("\u0394")                         # using a 16-bit hex value
print("\U00000394")                     # using a 32-bit hex value

If you have a bytes-object you can call the decode() method on it and give an encoding as an argument. Conversely you can encode() a string.

Creating Strings

Both single '' and double "" quotes denote a string. They are equally valid and it is a question of preference which to use. It is recommended to be consistent within the same file, though.

It is permissible to use single quotes inside a double-quoted string or double quotes inside a single quoted string.

If you want to have the same kind of quotes inside a string, you must escape the quotes in the string with a backslash \. As it is the escape character, any backslashes must be entered as double \\ to create a single literal backslash in a string.


In [ ]:
permissible = "la'l'a'a"
print(permissible)
permissible = 'la"l"a"a'
print(permissible)
permissible = "\"i am a quote \\ \""
print(permissible)

There are several ways to create multiline strings

  • multiline string notation using triple quotes
  • having multiple consecutive string literals inside parentheses, they will be interpreted as one

In [ ]:
permissible = """
i am a multi
line
string
"""
print(permissible)

In [ ]:
permissible = ("i"
              " am" #note the whitespace before the word inside the string
              ' a'
              " multiline"
              ' string')
print(permissible)

String wrangling

First it is essential to remember that strings are immutable: whatever you do with a string, it will not change. Most methods on strings will return a new, modified string or some other object.

If you have any programming experience many of the following examples will seem familiar to you.

A complete list can, as always be found at the documentation.


In [ ]:
example = "The quick brown fox jumps over the lazy dog  "

In [ ]:
## the split function splits at whitespace by default
example.split()

It can be given any parameter. Te return value is a list so it can be indexed with [].


In [ ]:
example.split("e")[0]

Strings can be indexed and sliced using the same notation as lists


In [ ]:
example[5:10]

The strip() function removes the first and last instances of a character from the string, defaulting to whitespace. This is surprisingly often needed.


In [ ]:
example.strip()

Strings can be coerced to lower() or upper() case.


In [ ]:
example.upper()

Seeking a substring is also implemented with the find() method. It returns an index.


In [ ]:
example.find("ick")

Sometimes it's important to know if a string is a digit or numeric.


In [ ]:
"124".isdigit()